Improving free-energy estimates from unidirectional work measurements: theory and 

experiment 
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We derive analytical expressions for the bias of the Jarzynski free-energy estimator from N 
nonequilibrium work measurements, for a generic work distribution. To achieve this, we map the 
estimator onto the Random Energy Model in a suitable scaling limit parametrized by (log N) //x, 
where n measures the width of the lower tail of the work distribution, and then compute the finite- N 
corrections to this limit with different approaches for different regimes of (logiV)/^. We show that 
these expressions describe accurately the bias for a wide class of work distributions, and exploit 
them to build an improved free-energy estimator from unidirectional work measurements. We apply 
the method to optical tweezers unfolding/refolding experiments on DNA hairpins of varying loop 
size and dissipation, displaying both near-Gaussian and non-Gaussian work distributions. 

PACS numbers: 02.50.-r,05.40.-a,05.70.Ln 



The accurate measurement of free-energy changes has 
important applications in physics, chemistry, and bi- 
ology. Traditional measurement methods rely on re- 
versible, near-equilibrium transformations, which how- 
ever are often unfeasible. In recent years, new results in 
nonequilibrium statistical mechanics have suggested ways 
to measure free-energy changes from experiments (and 
simulations) far from equilibrium (see [if for review). 
The Crooks fluctuation theorem (CFT) [2[ states that 
the probability distribution p(W) of the work W done on 
a system driven out of equilibrium following an arbitrary 
finite-time protocol obeys the relation p(W)/pn(—W) = 
e (w-AF)/k B T _ Herej pfi(yV*) is the work distribution 
(WD) for the corresponding time-reversed protocol, AF 
is the free-energy difference between the final and ini- 
tial equilibrium states Q, and T is the temperature. 
Hence, AF can be estimated in bidirectional experiments 
by repeating many times the forward and reverse proto- 
col, as demonstrated using single-molecule manipulation 
techniques 0, [H[ . An asymptotically unbiased estimator 
based on the CFT is the acceptance ratio (AR) estimator 

In many experimental settings, which we shall call uni- 
directional, the reverse work cannot be measured. Ex- 
amples arc found in AFM pulling of biopolymers 0, S] , 
steered simulations Q , free-energy landscape reconstruc- 
tion [Io[ , and single-molecule experiments on protein un- 
binding, intercalation, specific cation binding, antigen- 
antibody interactions, and non-native protein conforma- 
tions. In these cases, an alternative method is provided 
by a corollary of the CFT, the Jarzynski equality (JE) 
( e -w/k B T^ _ e -AF/k B T^ wnerc ( . ) i s the expectation 

over p(W) [1] . Given N work measurements W\,. . . , Wn 
under the same protocol, the Jarzynski estimator 
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converges to AF from above as N — > oo (here and 



henceforth we set k B T = 1 and express all work val- 
ues in units of fc^T at room temperature). In practice, 
convergence of AFjv requires that rare trajectories with 
Wi < AF be sufficiently represented, which in turn re- 
quires N 3> exp(D t yp), where D typ is the typical value 
of the dissipated work, D = W — AF [ll|. Therefore, 
AFjy is a reliable estimator of AF only when D typ is not 
much larger than k B T. It is thus important to have a 
quantitative estimate of the bias Bn = AFn — AF. The 
mathematical problem faced is that of calculating the 
distribution of a (log)sum of exponentials of i.i.d. ran- 
dom variables, Eq.lJTJ, which depends on the system- and 
protocol-specific WD. No closed solution to this problem 
is available even for a Gaussian WD (GWD). Ex- 
pansions in A^ -1 [ill, U are only applicable when the 
bias is of order A^ -1 , i.e. smaller than the 0(N~i) sta- 
tistical error and thus negligible. In the relevant regime 
Bn » 0(N^ 1 ), power-law interpolations in N Q and 
other approximations [l2[ have been discussed, but no 
reliable analytical theory exists. 

In this Letter, we derive analytical expressions for the 
bias expectation (-Bjv) for a wide class of WD's and val- 
idate them by comparison with exact numerical simula- 
tions, also in the regime of large bias. We use these re- 
sults to build an improved unidirectional free-energy es- 
timator by correcting for the bias of Eq.([TJ). We then dis- 
cuss unfolding/refolding experiments on DNA hairpins, 
which allow us to test our method against the bidirec- 
tional AR estimator. 

The experimental setup is shown in Fig.[lja). We syn- 
thesized five hairpins (A,B,C,D,E) with identical stem 
and (GAAA...) loops of 4,6,12,16,20 bases, respectively 
[FigHt]. The hairpins are inserted between two short 
(29bp) dsDNA handles to improve signal-to-noise reso- 
lution [l(J. The construct is tethered to two beads, one 
held by a pipette, the other by an optical trap created by 
countcrpropagating laser beams 17| . The light deflected 
by the trapped bead provides a direct measurement of 
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the force acting on the molecule. By moving the trap 
away from the pipette at constant velocity, the hairpin 
is stretched until it unfolds. Subsequent reversal of the 
velocity causes the hairpin to refold. By repeating this 
cycle (ss 200 — 1000 times per experiment) we collect the 
histogram of the WD's Pu,r(W) for the work to unfold 
(U) and refold (R) the hairpin, measured by integrating 
the force-distance curves (FigQj:) for the forward and re- 
verse part of each cycle (see Sec. 1 in [l8| for details). 
We divide the data in blocks of N cycles, compute AFjv 
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FIG. 1: Bias measurements in DNA hairpins, a) 

Experimental setup, b) Hairpin B sequence, c) Exam- 
ples of force-distance cycles for hairpins A-E. d) Left: his- 
tograms of pu(W) (upper), Pr(— W) (lower) for hairpin A 
pulled at 400 nm/s. The horizontal line is the AR estimate 
AF A r = 540.5 fc s T, giving 1(D) | = \(W) - AF AR \=4.3 k B T 
(U), 3.9 k B T (R). The lines are GWDs fitting the lower (up- 
per) tail of pu(W) (pr( — W)). Right: Jarzynski estimator 
(AFjv) as a function of N for U and R. Errors are estimated 
by jackknife. The lines represent AFar + (Bn) with (Bn) 
given by Eq.© (dashed line) and Eq.© (continuous line) for 
the GWD case, assuming D c = (D) = fi, where fj, is estimated 
from the GWD fit to the tails, e) Same as d) but for hairpin 
C pulled at 65 nm/s [|(D)|=13.6 k B T (U); 14.8 k B T (R)]. Also 
shown is AFar + (Bn) with (Bn) given by Eq.(0 (dotted 
line) . 

for each block, and average over the blocks to estimate 
(AFn) for U and R separately. As shown in FigOJd,e) 
for hairpins A and C, (AFn) tends to the AR estimate 
AFar for large N, from opposite sides for U and R. Note 
that the dissipation increases with loop size and pulling 
speed. 

We analyze theoretically the bias for a generic WD 
with finite mean and an unbounded lower tail which de- 
cays as 



p(W) 



W-W r 



■ cxp 



\W-Wc 



(2) 



for W <C W c , where W c is a characteristic work value, 
51 > measures the tail width and q is a normalization 
constant. For the JE to hold, generally one must have 
S > 1 [jgj . Two key parameters in the following are 
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point calculation gives (e 
x ) — kD c ) for large fi, where D c = W c 
Hence the JE implies fj, — > D c in this limit. An ex- 
ample of a WD obeying Eq.® is a GWD with mean 
(W) = W c and variance er 2 ^ = Q 2 /2 (i.e. a = 0, 8 = 2, 
q = 7T -1 / 2 ), for which (e~ kD ) = exp(ju/c 2 — k(D)) and 
thus [i = (D) = D c = exactly for all Q. This 

relation allows one to define another unidirectional 
estimator AF V = (W) ~ cr^/2 [U, since AF V = AF for 
the GWD. 

Scaling limit - Our strategy consists in comput- 
ing first (Bn) in a suitable scaling limit, and then 
the finite- N corrections to this limit. We obtain 
the scaling limit by mapping the problem onto the 
Random Energy Model (REM) [U as B N = D c + 
log N - logZjv(/3 = 0/(log 2 A0 (5 - 1)/<5 ), where Z N (P) = 

E l =i ex P[-/ 3 ( 1 °g2^) ( ' 5 " 1)/5 ^] is thc R- EM partition 
function and the i.i.d. variables Ei have a distribution 
decaying as \E\~ a exp(-\E\ s ) for E < -1 [UHl]. From 
the known limit of A^ 1 (log Z N (fi)) for N ->■ oo ^2, [H 
we obtain (Bn) -» B REM in the scaling limit (N, O) — >• oo 
with A finite. For A > 1, all terms in Zn give a finite 
contribution and we find B IiEM = D c — /i. For A < 1, 
corresponding to the glass phase of the REM [22] , Zn is 
dominated by a finite number of terms and we obtain 
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= D c + \ogN -il(\ogN) 1/s . 

(4) 

Figure [2^ shows the approach to the scaling limit as f2 
increases, for the GWD case. Significant deviations occur 
for moderate f2, from which the need to compute finite- A^ 
corrections is apparent. 

Finite-N corrections - One must distinguish three 
regimes, which require different analytical approaches: 
A > 1, A <C 1, and A < 1. For A > 1, by partially resum- 
ming the 1/N expansion we obtain a closed expression for 
(Bn) that improves considerably over the truncated ex- 
pansions previously considered jl 31 ] , which are valid only 
for A S> 1 (see Sec. 2.1 in [3]). However, the most rel- 
evant regime in applications of thc JE is A < 1, since 
in practice one usually has A" <C exp(D typ ) . In the limit 



A< 1, using an extreme- value approach (Sec. 2.2 in 18() 
we obtain to leading order 



(Bn) — B REM — X' 



1 — a — S 



IE + jj log log N + \og(q/5) 

(5)' 

where 7r is the Euler-Mascheroni constant. Cook and 
Derrida [24[ were able to compute the finite-A^ correc- 
tions in the critical region A < 1 for the GWD case in the 
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FIG. 2: Finite-Af corrections to the bias for a GWD. 

a) Convergence of (-Bjv) to its scaling limit. The points 
joined by lines are averages of Bn = — log iV -1 £V e~ Di over 
many sets (Di, . . . ,Dn) sampled from a GWD with variance 
Q, 2 /2 and mean (D) = f2 2 /4. The dashed line represents 
B RE m = - A 1/2 ) 2 [(Eq.Q for 5 = 2]. Inset of a): The 
unsealed data. The continuous lines represent Eq.©. The 
horizontal dashed line {Bn) = 1 indicates the accuracy limit 
common in biophysical studies, b) Bias |{-Bjv)| (estimated as 
|{AFjv} — AFar\) for all experiments on hairpins A,B,C and 
iV = 4, 16, 64, including both U and R. The pulling speed is 
in the range 25 - 300 nm/s and \(D)\ is in the range 1-20 
k B T. The lines show Eq.© for the GWD. 



context of the REM. We have extended their traveling- 
wave approach to the more general case of Eq. ([2]) . In this 
way (see Sec. 2.3 in [HI for details) we recover Eq.© for 
A <C 1, while for A < 1 we obtain 
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where erfc is the complementary error function and 6 = 
(A 1 ^ - l)^5M[N/^2(5-l). 

Numerical test - Equations ([5]) and (J6j> provide, via 
Eqs.© and ((H), explicit expressions for (Bn) as a func- 
tion of D c ,log-/V, and the shape parameters 5, f2, a, q 
of the WD tail. To illustrate the validity of these ex- 
pressions, we computed numerically (Bn) by sampling 
Wi from the GWD and from the Weibull WD (WWD): 
p(W) = 5Vt- 5 \W -W^ 5 - 1 exp-QW - W c \/Q) 5 for W < 
W c ; p(W) = for W > W c , where W c is fixed numer- 
ically by imposing the JE. The WWD satisfies Eq.® 
(a = 1 — 6, q = 6) and allows us to model tails falling 
faster (6 > 2) or more slowly (S < 2) than a GWD. In this 



In their respective range of validity in A, Eqs.([5]) and 
© agree very well with the numerical data for the entire 
range tested (1.1 < 5 < 3, 1.41 < fi < 16) also for 
large bias, as shown for some cases in Fig. 3(a) (for more 
examples see Fig.S4 in [H|)- Furthermore, substituting 
D c with /i in Eq.@ worsens only slightly the agreement, 
an imp ortant observation for the following (see Sec. 2.5 
in [18|). In the special case of the GWD, Eq.([6]) gives 
(Bn) = log 2 at A = 1. The empirical one-parameter 
power law 

(B N ) = (D)N- Z , z = -(D)- 1 \og(\og2/(D)) 7 (7) 

interpolating between A = 1 and A = (for which 
(Bn) = (D)) also fits fairly well the GWD data, as shown 
in Fig. 2, although less so than Eq.© for large ft (see also 
Fig.S4 in [l8l | ). A power-law fit of the bias in N was pro- 
posed in Ref.[l4J. Figure 2(b) shows that the bias of all 
our experiments with mild dissipation is well described 
by Eq.© for a GWD. 
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FIG. 3: free-energy recovery for non-Gaussian WDs. 

a) Analytical estimates of {Bn) as given in Eq.([5| (dashed 
lines) and Eq.© (dotted lines) with D c replaced by fi, tested 
against simulated data for the WWD (symbols). From top to 
bottom, (<5,n, (D))= (1.1, 1.83, 31.0), (1.1, 1.68, 13.5), (1.5, 6, 
30.1), (1.5, 4, 8.8). b) WD for hairpin E pulled at 130 nm/s 
and fitted WD tails assuming S = 1.5 (dashed lines). The 
vertical line represents AFar- c) {AFn) for U (circles) and 
R (triangles). Note the very slow convergence. The improved 
estimator AF N obtained by correcting for the bias with Eq.© 
(squares) and Eq.® (stars) using 5 = 1.5 converges quickly to 
a value consistent with AFar = IGOksT (dashed horizontal 
line) d) AF N recovered from U and R as a function of S for 
N = 182. Using Eq.© (dashed lines) or Eq.© (continuous 
lines) gives nearly the same result. 

Free-energy recovery method - The fact that Eqs.© 
and © describe well the bias when D c = fi suggests an 
improved estimator AF^ applicable to any problem in- 
volving the logarithm of an exponential average: i) given 
iV" measurements Wi, compute AFjv and the histogram 
of the WD; ii) estimate the shape parameters fi, S, a, q 
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(and thus fi) by fitting the histogram tail to Eq. ((2]) , tak- 
ing for instance the maximum of the WD for W c [l5j ; 
Hi) define AF^ = AF N — (B N ) taking for (B N ) either 
Eq.© or ^ (depending on the value of A) and setting 
D c = /i. 

In the special case of the Jarzynski estimator, we must 
take into account the stronger constraint imposed by the 
CFT, which implies that (e~ w ) is dominated by values 
of W near the maximum of the reverse WD |25j |. 
Sampling these very rare events is usually unfeasible, so 
there is no guarantee that the fit to the measured p(W) 
will continue to hold near W' . Nevertheless, we argue 
that a distinction should be made between near-Gaussian 
and non-Gaussian tails. In the former case (i.e. when 
the tail can be fitted with 5 w 2), it is reasonable to 
assume that the fit will hold near W\ hence, AF^ should 
give a good estimate. The distance from a GWD can 
be self-consistently quantified a posteriori by the ratio 
r = er^/(2(L>)) (r = 1 for a GWD), taking (D) = (W) - 
AF£f. 

We return now to our DNA experiments, for 
which we can compare the unidirectional estimators 
AF N ,AF V ,AF^ separately for U and R with the bidi- 
rectional estimator AFar- Figure 1(d) shows a case with 
mild dissipation, for which the bias of AF/v is small and 
all estimators converge. Figure 1(e) shows an experi- 
ment with intermediate dissipation. In this case sam- 
pling the tail of pu near the maximum of pr (or vicev- 
ersa) would require an unfeasible number of cycles. Nev- 
ertheless, both tails are well fitted with <5 = 2, a = 0. 
[They are not perfect GWD, being slightly asymmetric. 
Using (D) = (W) - AF AR we obtain r = 0.67 (U), 0.81 
(R).] The curves in Fig. 1(e) represent AFar + (Bn) with 
(B N ) given by Eq.©, ©, Q. We find that AF^ agrees 
with AFar within its statistical error for N > 5, for 
all three expressions. This represents a significant im- 
provement over the variance estimator, which has a bias 
AF V - AF A r = 4.3 ± 0.8 (U), -2.9 ± 0.8 (R), and over 
the uncorrected Jarzynski estimator. For instance, we 
have AF N - AFar = 5.8 ± 3.0 (U), -6.6 ± 2.0 (R) for 
N = 36, and AF N - AFar = 5.1 ± 0.6 (R) for N = 289. 
Furthermore, the fitted WD satisfy fairly well the CFT 
(see Sec. 4 in [Hj]), giving further confidence in the con- 
sistency of the method. 

Finally, we consider an experiment where the tails are 
far from a GWD and the dissipation is large (FigJ3]). In 
this case the WDs are wide apart [FigfSJb)] and (AFn) 
converges very slowly [(Figf3jc)]. Equation (|2|) fits rea- 
sonably well the WD tails for a fairly broad range of 5 
values (we take a = for simplicity [15|). The estimator 
AF^ is shown in FigjSJi: the pronounced dependence on 
5 and the discrepancy between U and R confirm that the 
predictive power of Eqs. (|5l6[) relies on accurately knowing 
5 (see Sec. 3 in [l8| for other examples). Note also that 
Eqs. (|5l6p are ill-defined as the exponent 6 approaches 1 
[l9( . As 8 decreases the fitted tails satisfy less and less the 
CFT (see Sec. 4 in [l8j]). signaling that 5 must increase 



further out in the tails. It is an open problem to general- 
ize our analytical approach to an effective 6 varying with 
W. 

In summary, we obtained analytical expressions for the 
bias of the Jarzynski estimator, and showed that they 
can be used to obtain improved unidirectional estimates 
of the free energy of mechanical unfolding of DNA hair- 
pins, provided the WD tail is described by a compressed 
exponential over a wide enough range of work values. 
These results are applicable to many unidirectional ex- 
periments and simulations, and are relevant to other con- 
texts involving sums of random exponentials. 
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